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Abstract 

In  this  paper  we  consider  a  natural  probabilistic  variation  of  the 
classical  miaimum  spanning  tree  (MST)  problem,  which  we  call  the 
probabilistic  minimum  spanning  tree  problem  (PMST).  In  particular, 
we  consider  the  case  where  not  all  the  points  are  deterministically 
present,  but  are  present  with  certain  probability.  We  discuss  the  ap- 
plications of  the  PMST  and  find  a  closed  form  expression  for  the  ex- 
pected length  of  a  given  spanning  tree.  Based  on  these  expressions 
we  prove  that  the  problem  is  NP  —  complete.  We  further  examine 
some  combinatorial  properties  of  the  problem,  establish  the  relation 
of  the  PMST  problem  with  the  MST  problem  and  the  network  de- 
sign problem  and  examine  some  cases  where  the  problem  is  solvable 
in  polynomial  time. 


Key  words:  Probabilistic  combinatorial  optimization  problems,  minimum  span- 
ning tree,  network  design,  NP  —  completeness. 


1      Introduction 

The  classicad  minimum  spanning  tree  (MST)  problem  plays  an  important  role 
in  combinatorial  optimization.  It  possesses  the  matroidal  property  which  al- 
lows the  greedy  algorithm  to  solve  the  problem  optimally,  and  thus  it  is  the 
prototype  for  problems  solvable  in  polynomial  time.  For  a  summary  of  its 
properties  and  algorithms  for  its  solution  see  Papadimitriou  and  Steiglitz  [7]. 
From  a  practical  point  of  view,  it  has  important  applications  in  transporta- 
tion, communications,  distribution  systems,  etc. 

In  this  paper  we  consider  a  natural  probabilistic  variation  of  this  classical 
problem.  In  particular,  we  consider  the  case  where  not  all  the  points  are 
deterministically  present,  but  are  present  with  certain  probability.  Formally, 
given  a  weighted  graph  G  =  {V,E)  and  a  probability  of  presence  p{S)  for 
each  subset  S  of  V,  we  want  to  construct  an  a  priori  spanning  tree  of 
minimum  expected  length  in  the  following  sense:  on  any  given  instance  of  the 
problem  delete  the  vertices  and  their  adjacent  edges  among  the  set  of  absent 
vertices  provided  that  the  tree  remains  connected.  The  problem  of  finding 
an  a  priori  spanning  tree  of  minimum  expected  length  is  the  probabilistic 
minimum  spanning  tree  (PMST)  problem.  In  order  to  clarify  the  definition 
of  the  PMST  problem,  consider  the  example  in  Figure  1.  If  the  a  priori  tree 
is  T  and  nodes  2,7,9  are  the  only  ones  not  present,  the  tree  becomes  Ti. 
One  can  easily  observe  that  if  every  node  is  present  with  probability  Pi  =  1 
for  all  i  G  V  then  the  problem  reduces  to  the  classical  MST  problem. 

This  paper  is  part  of  a  more  general  investigation  of  the  properties  of 
combinatorial  optimization  problems  when  instances  are  modified  probabilis- 


Figure  1:  The  PMST  methodology 

tically.  Interest  in  this  class  of  problems  started  with  the  Ph.D  thesis  of 
Jaillet  [4]  on  the  probabilistic  traveling  salesman  problem  (PTSP),  where  he 
posed  the  problem,  examined  some  of  its  combinatorial  properties  and  proved 
asymptotic  theorems  in  the  plane.  In  Bertsimas  [1]  further  properties  of  the 
PTSP  are  derived  and  the  results  of  Jaillet  [4]  are  sharpened.  Bertsima^ 
[1]  includes  also  results  on  the  probabilistic  vehicle  routing  problem,  defined 
in  Jaillet  and  Odoni  [5]  and  probabilistic  facility  location  problems.  To  our 
knowledge,  the  PMST  problem  hcis  never  been  defined  before  in  the  litera- 
ture despite  its  intrinsic  interest  as  well  as  its  applicability.  In  Bertsimas  [2], 
which  is  a  sequel  of  the  present  paper,  we  perform  probabilistic  analysis  of 
the  PMST  and  prove  that  surprisingly  the  PMST  problem  is  asymptotically 
equivalent  to  the  strategy  of  re-optimization,  in  which  we  re-optimize  every 
instance  of  the  problem. 

In  the  next  section  we  discuss  some  applications  of  the  PMST  problem, 
while  in  section  3  we  address  the  question  of  finding  an  explicit  expression 
for  the  expected  length  of  an  a  priori  tree  T.    In  section  4  we  investigate 
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the  complexity  of  the  problem  and  we  prove  that  even  a  restricted  version  of 
the  problem  with  all  weights  equal  is  NP  —  complete,  which  in  view  of  the 
simplicity  of  the  MST  problem  is  a  quite  surprising  result.  In  section  5  we 
examine  some  combinatorial  properties  of  the  problem  (bounds,  relation  of 
PMST  and  MST,  relation  with  re-optimization  strategies  and  the  network 
design  problem).  In  section  6  we  exploit  these  combinatorial  properties  of 
the  problem  and  find  some  special  cases  which  are  solvable  in  polynomial 
time.  The  final  section  includes  some  concluding  remarks. 

2      Discussion  and  Applications  of  the  PMST 
Problem 

It  is  natural  to  ask  why  the  PMST  problem  is  interesting.  We  first  observe 
that  the  problem  defines  an  efficient  strategy  to  update  minimum  spanning 
tree  solutions  when  problem's  instances  are  modified  probabilistically  because 
of  the  absence  of  certain  nodes  from  the  graph.  We  denote  this  strategy  with 
Ejp,  where  Tp  is  the  optimal  a  priori  tree.  Then  in  the  instance  5,  i.e.  when 
only  nodes  in  the  set  S  are  present,  the  strategy  produces  a  tree  Tp{S)  with 
length  LjpiS),  which  is  the  length  of  the  tree  that  connects  nodes  from  the 
set  S  of  present  nodes  using  parts  of  Tp.  In  the  context  of  this  discussion  the 
letter  S  denotes  the  strategy  used. 
Two  possible  other  strategies  are: 

1.  A  re-optimization  strategy  Sa/sTi  in  which  we  find  the  minimum  span- 
ning tree  (MST)  of  the  set  of  present  nodes  in  every  instance.    We 


denote  with  Lmst{S)  the  length  of  the  MST  of  the  nodes  in  the  set  5. 

2.  A  re-optimization  strategy  T^steiner-,  in  which  we  find  the  minimum 
Steiner  tree  of  the  set  of  present  nodes  in  every  instance.  We  denote 
with  Lsteiner{S)  the  length  of  the  Steiner  tree  of  the  nodes  in  the  set 
S,  using  possibly  nodes  from  the  set  V  —  S. 

Remark:  The  above  definition  of  the  re-optimization  strategy  ^steiner  ap- 
plies only  for  the  case  of  a  fixed  network,  as  opposed  to  the  case  where  the 
points  are  located  in  the  Euclidean  plane.  In  this  case  Lsteiner{S)  is  the 
length  of  the  Steiner  tree  in  the  plane  of  the  points  from  the  set  S. 

Why  don't  we  use  these  re-optimization  strategies  Ea/st,  ^steineR', 
rather  than  the  strategy  Sj-p  we  are  proposing? 

Concerning  the  'E,steiner  strategy,  it  is  clear  that  Lsteiner{S)  <  Ltj,{S), 
because  the  tree  connecting  the  set  S  using  only  parts  of  the  tree  Tp  is  also 
a  solution  to  the  Steiner  problem.  The  disadvantage  of  the  Steiner  strategy 
is  that  we  have  to  solve  an  NP  —  hard  problem  in  every  instance,  something 
which  is  feasible  only  for  small  problem  instances. 

With  the  strategy  Sa/st  it  is  clear  that  we  can  compute  Lmst{S)  in 
0(|5|^),  using  the  greedy  algorithm,  but  it  is  not  clear  that  LmstIS)  < 
LxpiS).  In  fact  in  section  5  we  construct  examples  where  the  probabilistic 
strategy  Sj^  we  are  proposing  is  better  than  the  Ea/st-  Furthermore,  in  [2] 
we  prove  that  asymptotically  under  reasonable  probabilistic  assumptions  the 
probabilistic  strategy  Ej   is  at  least  as  good  as  the  Ea/st- 


What  is  more  important  is  the  fact  that  in  many  applications  we  need  a 
real-time  strategy  to  modify  the  solution  when  the  instances  sire  modified. 
Clearly,  the  PMST  strategy  satisfies  this  criterion,  since  the  tree  T{S)  can 
be  found  in  0(n)  time  as  follows: 

1.  Start  with  the  a  priori  tree  T. 

2.  Until  there  are  no  unmarked  leaves  in  T: 
find  an  unmarked  leaf  in  T; 

if  i  G  5  mark  it;  else  delete  i  from  T. 

3.  The  resulting  tree  is  the  tree  T{S). 

Since  we  are  only  looking  at  every  node  at  most  once,  this  is  an  0(n)  algo- 
rithm. Note  that  the  two  re-optimization  strategies  are  superlinear.  In  addi- 
tion, we  may  not  have  the  computer  resources  to  re-optimize.  An  even  more 
important  motivation  in  favor  of  the  Ejp  strategy  is  that  this  strategy  does 
not  change  the  underlying  network  structure,  while  both  the  re-optimization 
strategies  can  result  in  a  completely  different  network  structure  for  different 
problem  instances,  by  adding  new  edges  cind  deleting  old  ones.  In  a  commu- 
nication network  for  example,  it  may  be  very  expensive  or  even  impractical 
to  create  new  communication  links  for  each  problem  instance. 

After  this  discussion  of  the  various  strategies  available  when  problem  in- 
stances are  modified,  let  us  consider  some  potential  areas  of  application  of 
the  PMST  problem.  Consider  for  example  a  VLSI  environment.  Suppose  on 
a  circuit,  there  are  n  processors  subject  to  failure  and  processor  i  becomes 


inactive  with  probability  pi.  Then  we  would  like  to  connect  the  active  pro- 
cessors using  a  spanning  tree  structure,  which  minimizes  the  manufacturing 
cost.  Communication  of  two  active  processors  through  some  inactive  pro- 
cessors means  that  the  inactive  processors  allow  communication.  Since  in 
this  example  changing  the  underlying  network  structure  is  impractical,  the 
PMST  strategy  is  a  good  solution  to  the  problem. 

In  a  communication  network,  nodes  may  represent  communication  cen- 
ters, arcs  represent  communication  links  and  link  costs  are  the  communica- 
tion costs  among  centers.  The  probability  of  failure  p,  is  the  probability  of 
blocked  communication  in  center  i.  If  the  centers  axe  blocked,  they  can  only 
be  used  to  establish  communication  between  unblocked  centers.  Then  the 
problem  of  finding  an  a  priori  network  structure  of  minimum  expected  cost 
is  the  PMST  problem. 

A  more  unusual  application  of  the  PMST  problem  is  in  the  area  of  or- 
ganizational structures.  For  instance,  a  rather  intriguing  paradigm  for  the 
PMST  problem  in  the  area  of  organizational  structure  design  might  be  the 
following:  Suppose  the  n  points  that  we  wish  to  interconnect  represent  our 
agents  or  spies  in  a  foreign  country.  They  will  undertake  in  the  future  a  series 
of  missions,  each  mission  involving  a  different  subset  of  agents.  A  mission, 
in  our  context,  is  an  instance  of  the  problem.  We  are  looking  for  an  a  priori 
organizational  structure  in  which,  for  obvious  rea.sons,  each  agent  will  know 
only  the  people  immediately  above  or  below  him/her  in  the  structure;  this 
implies  a  spanning-tree-like  structure.  The  probability  p,  associated  with 
point  i  is  the  a  priori  probability  that  agent  i  will  have  to  participate  in 
any  random  mission  undertaken  bj'  the  network.  For  any  given  mission,  only 


that  part  of  the  organization  which  is  necessary  to  interconnect  all  the  agents 
participating  in  that  particuleir  mission  is  activated.  (For  example,  if  the  tree 
T  of  Figure  1  represents  the  network  and  agents  1,  3,  5,  6  and  8  must  be  in- 
volved in  a  particular  mission,  the  tree  Ti  of  Figure  1  represents  the  network 
and  subset  of  agents  that  will  be  activated.)  The  distance  between  points  i 
and  j  is  interpreted  as  the  cost  or  risk  of  exposure  incurred  when  agents  i  and 
j  must  communicate  or  work  with  each  other.  Given  p,  for  i  =  1,2, ...  ,n 
and  the  distance  matrix  for  all  possible  pairs  (t,j),  the  PMST  gives  the  or- 
ganizational structure  which,  in  the  expected  value  sense,  minimizes  the  risk 
of  exposure  of  the  network  on  a  random  mission. 

In  a  routing  context,  a  companj'  may  want  to  connect  all  demand  loca- 
tions, using  a  tree-like  structure.  The  cost  between  demand  locations  can 
represent  transportation  cost  and  the  probability  p,  represents  the  proba- 
bility of  having  a  demand  at  location  i.  The  company  would  like  to  find 
an  a  priori  spanning  tree  for  the  demand  locations,  such  as  to  minimize  the 
expected  transportation  cost. 

Other  areas  of  potential  application  can  be  in  the  areas  of  transportation 
and  of  strategic  planning. 

One  might  object  that  all  the  examples  we  have  discussed  represent  some 
idealization  of  reality.  Nevertheless,  the  PMST  is  a  generic  problem,  which 
in  many  applications  where  a  particular  type  of  randomness  is  present  can 
be  a  more  appropriate  model  than  the  classical  MST.  It  also  addresses  the 
question  of  finding  a  spanning  tree  which  is  optimal  on  the  average,  rather 
than  a  solution  which  is  optimal  on  a  particular  instance.  The  essential 
characteristic  therefore  of  the  PMST  problem  is  that  it  is  a  more  global 


problem  than  the  MST  problem;  as  well  the  optimal  solution  to  the  PMST 
is  a  robust  solution. 

Unfortunately,  as  we  prove  in  section  4,  one  pays  for  these  nice  properties 
(robustness,  globality)  by  changing  the  complexity  of  the  problem  radically. 
While  the  MST  problem  is  easily  solvable,  the  PMST  problem  is  NP  —  hard. 

3      The  Expected  Length  of  a  Given  Spanning 
Tree 

As  we  noted  in  the  previous  section  the  PMST  problem  defines  an  efficient 
strategy  for  updating  spanning  tree  solutions  when  problem  instances  axe 
modified  probabilistically  in  response  to  the  absence  of  certain  nodes  from 
the  graph.  Given  an  a  priori  tree  T  we  define  Lt{S)  to  be  the  length  of  the 
tree  which  connects  nodes  from  the  set  S  of  present  nodes  using  only  parts 
of  T.  For  example  in  Figure  1,  S  =  {1,3,5,6,8}  and  Lt{S)  is  the  length  of 
the  tree  T^. 

Then  if  the  set  S  of  points  present  has  probability  p{S),  the  problem  can  be 
defined  formally  as  follows: 
Problem  definition: 

Given  a  graph  G  =  (V,  E),  not  necessarily  complete,  a  cost  function  c  : 
E  —*  R,  a.  probability  function  p  :  2^  —*  [0, 1]  we  want  to  find  a  tree  T  that 
minimizes  the  expected  length  E[Lt]'- 

E[Lt]  =  E  PiS)LT{S),  (1) 

scv 
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where  the  summation  is  taken  over  all  subsets  of  V,  the  instances  of  the 
problem. 

Note  that  at  this  level  of  generality  we  can  model  dependencies  cimong  the 
probabilities  of  presence  of  sets  of  nodes.  An  additional  observation  is  that 
with  this  formulation  one  would  need  0(n2"),  {\V\  =  n)  effort  to  compute  the 
expected  length  of  a  given  tree  T.  We  would  like  to  be  able  to  compute  E[Lt] 
efficiently.  The  question  we  address  in  this  section  is  for  which  probability 
functions  p{S)  we  can  compute  efficiently  E[Lt]  for  a  given  tree  T. 
If  we  define 

h{S)  =  Pr{  none  of  the  nodes  in  S  is  present  }  =  X^rcv-sPC-^)?  then 
Theorem  1 
Given  an  a  priori  tree  T  its  expected  length  is  given  by  the  expression 

E[Lt]  =  E  c(e){l  -  h{K,)  -  h{V  -  /C)  +  /i(V)},  (2) 

where  Ke^V  —  A'e  are  the  subsets  of  nodes  contained  in  the  two  subtrees 
obtained  from  T  by  removing  the  edge  e  (see  Figure    2). 
Proof: 

Given  a  tree  T  let  us  consider  how  much  each  edge  e  ^  T  contributes  to 
E[Lt]-    By  the  definition  of  the  problem  only  the  edges  in  T  contribute  in 
this  expectation.  If  we  define  the  events: 
A{Ke)  =  at  least  one  node  in  A'e  is  present, 
then  the  contribution  of  every  edge  e  is 

c{e)Pr{A{K,)r]A{V-K\)}, 

because  the  edge  e  is  used  if  and  only  if  there  exists  at  least  one  node  present 
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V-K, 


Figure  2:  The  sets  K^,  V  -  K, 
in  A'e  and  at  least  one  node  present  in  V  —  Ke-  As  a  result, 
E\Lt]  =  X:  c{e)Pr{A{K,)  n  A{V  -  K,)}. 
But 


Pr{.4(A'e)n/l(V-A'e)}  =  Pr{[A'{K,)UA%V-K,)Y}  =  l-Pr{A%K,)UA%V-K,)} 

=  1  -  Pr{.4^(A',)}  -  Pr{A'{V  -  A',)}  +  Pr{A'{h\)  D  A'{V  -  A%)}. 

But  since  Pr{A%K\)]  =  Prjnone  of  the  nodes  in  K^  is  present}  =  h{Ke), 
we  easily  obtain  (2).  • 

Thus  if  instead  of  the  probability  function  p{S)  we  are  given  the  function 
h{S)  we  can  compute  E[Lt]  for  any  given  tree  T  in  0(n),  assuming  we  can 
compute  h{S)  in  0(1),  since  we  can  find  all  the  sets  A'e  for  all  e  in  0(n)  by 
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starting  the  computation  at  the  leaves.  An  interesting  case,  and  important 

in  practice,  is  when  the  nodes  are  present  independently.  Then  we  can  find 

an  explicit  expression  for  E[Lt]- 

Theorem  2 

If  node  i  is  present  with  probability  p,,  then  the  expectation  E[Lt]  of  a  given 

tree  T  is  given  by  the  expression 

£[ir]  =  E^(^){i- n(i-p.)}{i-  n  (i-p.)}-      (3) 

Proof: 

In  this  case,  because  nodes  are  present  independently 

h{S)  =  l[{l-p,). 
Substituting  the  above  expression  in  (2),  we  easily  obtain  (3).  • 

From  (3)  we  can  compute  E[Lt]  in  0{n^),  since  we  can  compute  h{S)  in 
0(|5|).  By  organizing  the  computation  carefully,  we  can  compute  E[Lt]  in 
0(n)  as  follows: 

1.  Let  a  =  n.ev'(l  -  P.)  :  let  a,  =  1;  Let  MARKED=set  of  leaves. 

2.  Until  node  set  is  empty: 

if  i  is  a  leaf,  let  a,-  =  (1  -  p.)  UjeMARKED,  (.,»eT  Oj; 
add  I  to  the  set  MARKED  ;  delete  i  from  T. 

3.  E[Lt]  =  Ee=(.,;)eT  c{e){l  -  a,){\  -  a/a,). 


13 


An  important  special  case  is  when  pi  =  p  for  all  i.  Then  E[Lt]  becomes 

E[Lt]  =  E^(^){1  -  (1  -Pr''}{l  -  (1  -pr''^''}.  (4) 


If  we  define 


then 


^{k)^{i-{i-p)''}{i-{i-pr-'},  (5) 


E[LT]  =  j:c{e)4>i\IQ).  (6) 

eer 

Based  on  these  closed  form  expressions  we  will  prove  in  the  next  section  that 
the  decision  version  of  the  PMST  problem,  even  with  p,  =  p  for  all  i  and 
c(e)  =  1  is  NP  —  complete.  An  additional  importance  of  the  expressions  (6) 
is  that  they  will  assist  us  in  deriving  some  key  combinatorial  properties  of 
the  optimal  solution  to  the  PMST  problem. 

4      The  Complexity  of  the  PMST  Problem 

In  this  section  we  prove  that  the  simplest  possible  case  of  the  PMST  problem 

with  equal  weights  c(e)  =  1  and  p,  =  p  is  NP  —  complete.   We  first  define 

formally  the  decision  version  of  this  restricted  problem. 

The  Restricted  PMST  Problem  (RPMST) 

Instance:  Given  a  graph  G  =  {V,E),  costs  c(e)  =  1  for  all  e  £  E,  a.  rational 

number  p,  0  <  p  <  1  and  a  bound  B. 

Question:  Is  there  a  spanning  tree  T  for  G  with 

E[Lt]  =  Z  ^(^){1  -  (1  -  P)"^'''}{1  -  (1  -  P)"" "^'•'}  <  ^? 

e6T 
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In  order  to  prove  that  the  RPMST  problem  is  NP  —  complete  we  will  need 

some  properties  of  the  function  </>(/:)  =  (1  —  x*)(l  —  a:""*),  x  =  1  —  p  defined 

in  (5). 

Proposition  3 

The  function  <f>{k)  has  the  following  properties: 

1.  If  it  <  m  <  f ,  then  <l>{k)  <  4>{m). 

2.  (i>{k  +  m)  <  (f){k)  +  <f){Tn). 

3.  34>i^)  -  2(f>{i)  >0. 

Proof: 

These  properties  follow  easily  from  elementary  algebraic  manipulations  as 

follows: 

1.  (f>{k)  -  <i){m)  =  (x'"  -  x*)(l  -  x"-'"-'^)  <  0  if  A:  <  m  and  m  4-  ^  < 

-  +  -  =  n. 
2  -r  2 

2.  (j){k)  +  4>{m)  -  4>{k  +  m)  =  (1  -  x*)(l  -  x"')(l  +  x""''-'")  >  0. 

3.  3<j^(3)  -  2(^(4)  =  3(1  -  x^)[l  -  x"-3)  -  2(1  -  x'')(l  -  x""")  >   (1  - 
x"-^)(l  +  2x^  -  3j3)  =  (1  -  x"-'*)(l  -  x)[l  +  x(l  -  x^)  +  x2(l  -  x)]  >  0. 

• 

We  now  have  all  the  required  tools  to  prove  that  the  RPMST  problem  is 
NP  —  complete. 
Theorem  4 
The  RPMST  problem  is  NP  -  complete. 
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Proof: 

Cleaxly,  RPMST  belongs  to  the  class  NP,  since  given  a  tree  T  we  can  com- 
pute E[Lt]  in  polynomial  time  (0(n))  and  compare  it  with  the  given  bound 
B.  In  order  to  prove  the  completeness  of  the  problem  we  will  reduce  the 
NP  -  complete  problem  EXACT  COVER  BY  3-SETS  (Garey  and  Johnson 
[3])  to  it. 

EXACT  COVER  BY  3-SETS  (E-3C) 
Instance:    A  family  S   =    {di,. . .  ,a,]  of  3-element  subsets  of  a  set  C   = 

{Ci,. . .,C3c}. 

Question:    Is  there  a  subfamily  5i  C  5  of  pairwise  disjoint  sets  such  that 

Given  an  instance  of  the  E-3C  problem,  we  define  the  following  instance 
of  the  RPMST  problem: 
G={V,E), 
V  =  RliSuC, 
R  =  {oo,. . .  ,ar}, 
r  =  3  +  3c, 

E  =  {(a„  ao),  2  =  1, . . . ,  r)  U  {(gq,  <t),  ct  G  5}  U  {(<7,  c),  c  G  a}, 
p  arbitrary  rational  with  0  <  p  <  1, 
B  =  {r  +  Zc)<j>{\)  +  c4,[A), 

(f>{k)  =  (1  -  x'=)(l  -  T"-''),  x=l-p,n  =  r  +  l+s  +  3c. 
As  an  example  if  5  =   {{ci,  02,03},  {02,03,05},  {02,04,05},  {04,05,  ce}},  c  = 
2,5  =  4,  the  corresponding  graph  is  presented  in  Figure   3. 

Let  T  be  a  feasible  {E{Lt]  <  B)  spanning  tree  of  G.  Clearly  (a,, oq)  G  T. 
We  now  show  that  if  E[Lt\  <  B,  then  {ao,a)  6  T  for  all  a  £  S.    Suppose 
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Figure  3:  Equivalent  instances  of  E-3C  and  RPMST 

first  there  exists  only  one  (ao,cr)  ^  T  for  some  a  £  S.    We  will  show  that 

E[Lt]   >  B.    Since  {ao.a)  ^  T,  there  exists  i  G  5  and  j   E.  C  such  that 

(ao,t),(i,  j),(i.cr)  G  T  (see  Figure   4a).  We  define 

gi  =  the  number  of  nodes  in  C  —  {j}  that  are  adjacent  to  /  in  T.    In  the 

example  in  Figure    4a,  g,  =  l,g^  =  2. 

We  also  define 

si  =  the  number  of  nodes  in  5  —  {i,cr]  in  T  that  are  adjacent  to  exactly  / 

vertices  from  C  in  T  [l  =  0,1,2,3). 

From  these  definitions  we  get 

5i  +  252  +  353  =  3c  -  5,  -  <7^  -  1  =J>  53  =  -(3c  -  2^2  -  s^  -  g,  -  g„  -  I).  (7) 

We  now  write  an  expression  for  E[Lt]- 

E[Lt]  =  r<^{l)  +  {3c-g,-g„-l)4>{l)-\-s■,<i>{2)  +  S2<f>{3)-\-S34>{^)  +  4>{9^+9c  +  3)+ 
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where  the  first  term  (r^(l))  is  from  the  contributions  of  the  r  edges  (a,,  oq), 
the  second  term  is  from  the  contributions  of  the  edges  connecting  the  nodes 
in  C  except  the  ones  that  are  connected  with  i,(T,  and  the  terms  <f){g,  +  g,,  + 
3),  (f){2-\-ga),  4>{\+g„)  are  from  the  contributions  of  the  edges  (ao,t),  (t,  j),  {j,cr) 
respectively.  Then 

E[Lt]  >  B  =  (r  +  3c)4>{l)  +  c4>{i)  ^ 

si4>i2)+s2(f>iZ)+s3(l>{4)+4>{9,+ga+^)+H^+ga)-\-Hi-\-g.)-H^)-c<f>{i)  >  0. 

Substituting  (7)  we  get 

E[Lt]  >  B^  \s,[Z<t>{2)  -  <?i(4)]  +  ^52[3<;5(3)  -  2(^(4)]  + 

^[3<^(5.+5.+3)-(5.+5.)<^(4)  +  <^(2+5.)]  +  ^[2<^(2+5.)-<^(4)]  +  [<^(l+5.)-<i(l)]>0. 

Using  proposition  3  we  can  eaisily  check  that  all  the  terms  in  [  ]  are  strictly 
positive  and  thus  E[Lt\  >  B. 

Suppose  now  that  there  are  (oo,  cri ),...,  (accrj.)  ^  T  (see  Figure  4b).  Since 
T  is  a  tree,  there  exist  i  E  S  and  j  ^  C  such  that  {ao,i),{i,j),{j,(Tk)  €  T. 
Then  if  we  add  the  edge  (gq,  <^k)  and  delete  the  edge  (j,  ak)  we  get  a  new  tree 
Tk-\ ,  in  which  there  are  only  k  —  I  nodes  <7i, . . . ,  cr^-i  not  connected  with  oq. 
If  we  denote  the  tree  T  with  Tk  in  order  to  represent  the  fact  that  there  are 
k  nodes  in  T  not  connected  to  ao,  we  claim  that 

E[Lt,]  >  E[Lt,_,]. 

Let  li,,  Uj.u^^  be  the  number  of  nodes  in  the  subtrees  from  nodes  i,j,crk 
respectively  (see  also  Figure  4b).  The  contribution  of  edges  in  Tk,Tk-i  that 
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Figure  4:  The  cases  (oq,  <t)  ^  T  and  (oq,  ctj),  . . . ,  (oq,  (Tk)  ^  T 

are  not  involved  in  the  cycle  created  by  adding  the  edge  (cq,  (Tk)  is  the  same. 
Then 

E[Lt,]-E[Lt,_,]  =  (^(u.  +  l  +  Uj  +  H-u„,  +  l)  +  (/>(uj  +  H-u^,  +  l)  +  <A(u^,  +  l)- 

(f>{u,  +  1  +  t/j  +  1)  -  <j>{u,  +  1)  -  <?i(ti^,  +  1), 

where  <f){u,  +  1  +  Uj  +  1  +  t;<,^  +  1),  4>{uj  +  1  4-  u,,;^  +  1),  4>{u„^  +  1)  are  the 
contributions  in  Tk  of  (oo,i),  (i,i)  and  {j,(Tk)  respectively.  Similarly  in  Tk-i 
the  terms  <f){u,  +  l  +  Uj  +  \),  4>{uj  +  l)  and  <f>{u„^  +  l)  are  from  {ao,i),  (i,  j)  and 
{ao,<^k)  respectively.  By  proposition  3  we  have  that  <f>{ui  +  l-\-Uj-\-l-\-u^^  +  \)  > 
<f){u,  +  1  +  Uj  +  1)  and  4>{uj  +  1  +  u„^  +  l)  >  <t>{uj  +  1).  As  a  result,  E[Lt^]  > 
E[Lt^_^].  Note  that  we  have  used  the  fact  r  =  5  +  3c,  since  in  order  for 
proposition  3  to  hold  we  need  u,  + l  +  Uj  +  H-ti<,* +  1  <  5  +  3c  <  ^  =  i±l±2±3£. 
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As  a  result,  the  expected  cost  of  T  decreases  by  adding  one  missing  arc 
{O'Oi'^k)-  Making  this  trajisformation  inductively  we  find: 

E[Lt]  =  E[Lt,]  >  E[Lt,_,]  >..    >  E[LtA. 

But  since  the  tree  Ti  has  only  one  missing  arc  (oq,  cti),  we  have  already  proved 
that  in  this  case  E[Lj^]  >  B. 

Therefore,  it  follows  that  for  the  tree  T  to  be  feasible  all  edges  (ao,(7,)  e  T. 
We  will  now  show  that 

E[Lt]  <  B  •<=>E>3C  has  a  solution. 

By  using  the  quantities  si  (/  =  0, 1,2,3)  defined  above  we  have 

5i  +  2.':2  +  353  =  3c,        5o  +  5i  +  52  +  53  =  S. 

The  expected  cost  of  T  is  then  given  by 

E[Lt]  =  (r  +  3c)<^(l)  +  s,4>{2)  +  S2<f>{3)  +  53<^(4). 

Thus 

E[Lt]  <B^  5i<;5(2)  +  52<^(3)  +  (53  -  c)<^(4)  <  0  <» 

^-s,[3<p{2)  -  ,p{A)]  +  ^52[3^(3)  -  2(^(4)]  <  0.  (8) 

From  proposition  3,  3(^(2)  -  <^(4)  >  0  and  3(?i(3)  -  2<^(4)  >  0.  As  a  result, 
inequality  (8)  holds  if  and  only  if  Sj  =  52  =  0  and  hence  53  =  c,  which 
is  equivalent  to  E-3C  having  a  solution.  Thus  E[Lt]  <  B  •O  E-3C  has  a 
solution,  and  hence  the  RPMST  problem  is  NP  —  complete.  • 
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We  can  add  some  insight  to  why  the  problem  is  hard  by  noticing  the 
following  remarkable  fact.  As  p  — ^  1  the  PMST  approciches  the  MST  which 
is  easily  solvable.  What  is  the  limit  as  p  — >  0?  In  this  case 

<i>{k)  =  (1  -  (1  -  p)'){l  -  (1  -  p)"-*)  -  p'k{n  -  k). 

.\s  a  result, 

The  expression  Hegr  c(e)|/Ve|(n  —  \Ke\)  is  the  objective  function  of  another 

famous  problem,  the  NETWORK  DESIGN  PROBLEM  on  a  tree,  which  is 

defined  as  follows: 

NETWORK  DESIGN  PROBLEM 

Instance:  A  graph  G  =  {V,  E),  a  weight  c(e)  for  each  e  £  E  and  a  bound  B. 

Question:  Is  there  a  spanning  tree  T  for  G  such  that,  if  W{{u,v)])  denotes 

the  sum  of  the  weights  of  the  edges  on  the  path  joining  u  and  v  in  T,  then 

f{T)=    ^   W{{u,v})<B? 

It  is  easily  seen  by  considering  the  contribution  of  every  edge  e  that  f{T)  = 
XIegjc(e)|A'el(7i  —  |A'e|).  The  network  design  problem  on  a  tree  was  proved 
NP  —  complete  in  Johnson,  Lenstra  and  Rinnooy  Kan  [6].  Thus  the  PMST 
problem  approaches  as  p  ^  0  an  NP  —  complete  problem  which  gives  some 
intuition  as  to  why  the  problem  is  hard.  In  fact,  it  is  this  observation  that 
originally  led  us  to  suspect  that  the  PMST  problem  is  hard. 

We  have  proved  that  the  restricted  version  of  the  PMST  with  equal  costs 
on  a  non-complete  graph  is  NP  —  complete.  We  now  prove  that  even  if  the 
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graph  is  complete,  but  the  costs  are  either  small  or  large,  the  problem  is  still 

hard. 

The  PMST  problem  on  a  complete  graph 

Instance:    A  complete  graph  A'„,  a  cost  c(e)  €   {1,M},  a  bound  B  and  a 

probability  p,  0  <  p  <  1. 

Question:  Is  there  a  spanning  tree  T  with  E[Lt]  <  Bl 

Theorem  5 

The  PMST  problem  on  a  complete  graph  is  NP  —  complete. 

Proof: 

Clearly  the  problem  is  in  NP  because  of  the  closed  form  expressions  we  have 

found.  To  prove  that  the  problem  is  complete  we  use  the  same  reduction  as 

in  the  proof  of  theorem  4.  In  order  to  make  the  graph  complete  we  add  the 

remaining  edges  but  with  very  high  cost,  i.e  c(e)  =  {r+3c)iii)+c<t>{4)+i  _   rp^^^ 

if  we  include  any  edge  of  this  type,  its  contribution  would  be  c{e)4>{\Kt\)  > 

c(e)<^(l)  =  5  +  1,  i.e.  it  can  not  be  included  in  the  tree.  Therefore,  the  proof 

remains  unchanged  since  edges  with  large  costs  never  appear  in  a  tree  with 

E[Lt]  <  B.  • 

In  section  6  by  exploiting  some  combinatorial  properties  of  the  problem, 
we  examine  some  special  Ccises  of  the  PMST  in  which  the  problem  can  be 
solved  in  polynomial  time.  For  example  we  prove  that  in  a  complete  graph 
with  all  costs  equal  the  problem  is  solvable  in  0(n).  The  previous  theorems 
indicate  that  the  problem  is  hard  if  either  the  graph  is  complete  and  the 
costs  are  1  or  M  or  the  graph  is  non-complete  but  the  costs  are  equal.  If  we 
combine  these  two  requirements  (complete  graph,  equal  costs)  the  problem 
becomes  easy. 
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5      Combinatorial  and  Functional  Properties 
of  the  PMST 

In  this  section  we  examine  the  case  with  equal  probabilities  p,  =  p.   In  this 
case  we  are  trying  to  find  a  spanning  tree  that  minimizes  the  expression 

f{p)  ^  nun Mp)  =  min{  Yl  c{em\K,\)].  (9) 

5.1      Functional  Properties  of  the  PMST 

Expression  (9)  is  clearly  a  function  of  the  coverage  probability  p.  For  different 

values  of  p  the  corresponding  optimal  probabilistic  trees  which  minimize  (9) 

are  different.  We  first  address  the  question  of  specifying  the  properties  of  the 

function  f{p).   From  the  results  of  section  4  we  have  seen  that  it  would  be 

difficult  to  find  /(p)  for  a  particular  value  of  p,  but  can  we  find  some  global 

properties  of  this  function  which  will  give  some  insight  into  the  problem?  We 

call  these  properties  functional  because  they  are  related  to  the  function  /(p). 

Some  initial  observations  are  stated  in  the  following  proposition. 

Proposition  6 

The  function  f{p)  is  continuous,  increasing,  piecewise  differentiable.     For 

np  >  2  it  is  also  concave  if  the  costs  are  positive. 

Proof: 

We  examine  the  properties  of  the  function 

<^,(p)^(i-(i-p)'=)(i-(i-pr*^). 
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We  can  easily  check  that 

^mp)  =  (1  -  p)'-'m  -  (1  -  rr'')  +  n(i  -  pr-"(i  -  (i  -  p)')]  >  o, 


^Mp)  =  ^ 


[n(n-l)(l-p)"-*-Jt(Jt-l)- 
_(„  _  k){n  -k-l){\-  pr-^'jil  -  p)'-\    k  >  2; 
(n-l)(l-p)"-3(2-np),  k=l. 

It  can  be  easily  checked  that  j^4>k{p)  <  0  for  all  it  >  2  and  ■^4>\{p)  <  0 
if  np  >  2.  Thus  the  function  /r(p)  is  continuous  and  differentiable  since 
it  is  a  polynomial  and  furthermore  it  is  increasing  and  concave  for  np  >  2, 
since  it  is  a  weighted  sum  with  positive  weights  (c(e)  >  0).  Therefore,  the 
function  f{p)  is  concave  for  np  >  2  and  continuous,  since  it  is  the  minimum 
of  a  finite  number  of  concave  and  continuous  functions.  Furthermore,  /(p) 
is  increasing  because  for  pi  <  p2  if  fip,)  =  fTXPi)-,^  =  li2,  then  f{p\)  = 
/ti(Pi)  <  fT:i{pi)  <  /jzCpa)  =  /(Pj)-  Finally  there  is  a  finite  number  of  trees, 
which  can  possibly  minimize  /(p).  Thus  the  function  /(p)  has  a  finite  number 
of  breakpoints.  Between  successive  breakpoints  p,, p,+i,  /(p)  =  /r,(p),Pi  < 
P  ^  Pi+i  for  some  T,.  Hence  /(p)  is  piecewise  differentiable.  • 

We  can  now  combine  the  above  proposition  6  and  our  previous  observa- 
tions that  as  p  — >  1  the  PMST  tends  to  the  MST,  i.e.  the  optimal  tree  for 
p  close  to  1  is  the  MST,  and  as  p  ^  0  the  optimal  PMST  is  the  solution  to 
the  network  design  problem,  to  sketch  a  possible  graph  of  the  function  /(p) 
in  Figure   5. 
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MST 


fip)  \ 
Lmst 


Network  Design 
solution 


Figure  5:  The  PMST  problem  as  a  function  of  the  coverage  probability  p 
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5.2      Bounds  for  the  PMST 

Based  on  the  above  functional  properties  of  f{p)  and  some  properties  of  <i>{k) 

from  proposition  3  we  can  prove  the  following  proposition. 

Proposition  7 

If  Tp  is  the  optimal  PMST  and  Lj  is  the  length  of  the  tree  T,  then 

max[plA,sT,p(l  -  (1  -pr-')LT,]  <  E[Lt,]  <  (1  -  (l  -  py^^fL^fST.  (10) 

Proof: 

From  the  concavity  of  the  function  f{p)  we  get  that 

/(p)>p/(1)  +  (1-p)/(0)=pImst, 

where  clearly  L},fST  =the  length  of  the  minimum  spanning  tree,  which  is  the 
solution  of  PMST  for  p  =  1. 
From  proposition  3  we  get 

From  the  closed  form  expression  (6)  for  E[Lt]  we  find 

^(1)It  =  <l>{l)j:c{t)  <  E[Lt]  =  ^c(e)^(|A'el)  <  <;5(L^J)Ir. 

Since  E[LTp]  <  E[Lmst]^  we  ecisily  derive  (10).  • 

Exploiting  these  bounds,  we  address  the  question  of  how  good  is  the  MST 
as  a  solution  to  the  PMST  problem.  The  following  is  an  obvious  corollary  of 
the  bounds  (10). 
Proposition  8 
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T, 


Figure  6:  The  trees  Ti,T2 


E[Lmst]  -  E[Lt^]  ^  (1  -  p)(i  -  (1  -  p)lfJ-i) 


(11) 


Proof: 

Since  E[Lt,]  <  E[Lmst]  <  <?(LiJ)^mst  <  (1  -  (1  -  p)^?J)i^A/sr,  and 
E[Lt^]  >  pLmst  we  can  easily  derive  (11).  Note  that  as  p  — +  0  the  bound 
becomes  0(n).  • 

These  bounds  indicate  that  for  p  large  enough  (say  p  >  1/2  )  the  MST 
solution  is  a  good  approximation  for  the  solution  of  the  PMST  problem, 
which  is  consistent  with  our  intuition.  However,  as  p  — >  0  and  n  — »  oo 
this  bound  is  not  informative.  In  fact,  the  following  example  confirms  our 
intuition  that  the  MST  can  be  a  very  poor  solution  to  the  PMST  problem. 

Consider  a  complete  graph  A'„+i  with  cost  function:  c{i,i  +  1)  =  1,  i  = 
l,...,n  and  c(e)  =  2  for  all  e  ^  (z, z  +  1).  Note  that  the  cost  function 
in  this  example  satisfies  the  triangle  inequality.  If  the  tree  Ji  is  the  path 
1,2, . . .  ,n  +  1  and  T2  is  the  star  tree  rooted  at  node  n  +  1  (see  Figure  6), 
then  clearly  Tj  is  the  MST.  Then  E[Lt,]   =  (2n  -  1)<?S(1)  and  E[Lt,]  = 
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2E.li  Hi)  =  "(1  +  (1  -  pT)  -  2tlz£l±Eiiz£lLitlz£i:  (assuming  n  is  an  even 
number).  Then  if  Tp  is  the  minimal  PMST,  we  obtain 

E\Lmst\  ^  E[Lr,]       "(1  +  (1  -  p)")  -  oUrEllEilzfll^iirEn 


> 


E[Lt,]     -  E[Lt,]  (2n  -  l)p(l  -  (1  -  p)n) 

If  p  =  ^  for  some  constant  a  >  2  we  easily  obtain  as  n  — ►  oo  that 

E[L\jst] 


E[Lt,] 
Thus  from  (11)  we  always  have 

E[L\{st] 


>f)(n). 


=  0(n), 


E[Lt,, 
and  we  have  found  an  example  for  which 

-m^  -  ''<'■'■ 

As  a  result,  we  conclude  that  the  bound  (11)  is  the  best  possible. 

Furthermore,  we  can  address  the  opposite  question.     How  good  is  the 
PMST  solution  to  the  MST  problem? 
Similarly  we  can  show 
Proposition  9 


Ljp  —  Lmst  ^  1  —  P 


Lmst        -  p(l-(l-p)"-i) 
Proof: 

Inequality  (12)  follows  from  the  inequality  (10)  as  follows: 

p(l  -  (1  -p)"-^)Ztp  <  E[Lt,]  <  Lmst.* 
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5.3      Relation  of  the  PMST  problem  and  Re-optimization 
Strategies 

We  have  suggested  in  section  2  the  idea  that  the  PMST  problem  defines 
an  efficient  strategy  to  update  the  solution  to  minimum  spanning  tree  prob- 
lems, when  problem  instances  are  modified  probabilistically  because  of  the 
absence  of  certain  nodes  from  the  graph.  In  section  2  we  introduced  two 
other  alternative  strategies  Sa/sj  and  ^steineRi  in  which  we  find  the  mini- 
mum spanning  tree  (MST)  or  the  minimum  Steiner  tree  of  the  set  of  present 
nodes  in  every  instance.  If  Lmst{S)  and  LsTElNERiS)  denote  the  length  of 
the  MST  or  the  Steiner  tree  respectively  of  the  nodes  in  the  set  S,  we  then 
define  the  expectation  of  these  re-optimization  strategies  as  follows: 

EI^mst]  =  E  PiS)LMSTiS),  (13) 

scv 

E[T,stEINEr]  =    ^  p{S)LsTEINERiS),  (14) 

scv 
where  p{S)  was  defined  earlier  to  be  the  probability  that  only  nodes  in  S  are 

present.  In  this  section,  we  address  the  question  of  comparing  the  expectation 

of  the  re-optimization  strategies  with  the  expectation  of  the  PMST  strategy. 

In  general  it  is  difficult  to  find  a  closed  form  expression  for  E[T,mst],  since 

we  have  to  compute  a  sum  of  0(2")  terms.  Instead,  we  will  find  a  bound  on 

the  E[Lmst]- 

Proposition  10 

If  every  node  is  independently  present  with  probability  p  then 

rrv       ,  ^  np+(l-p)"-l, 
n  —  1 
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Proof: 


E[Uist]  =  E  P'(l  -  P)""*      E      LmsHS). 

fc=2  SCV,|S|=it 

We  define  Dk  =  Escv,|S|=fc  Lmst{S)  and  thus 

£:pA/5r]  =  Ep'(l-pr'Z?fc.  (16) 

We  claim  that 

We  will  prove  the  above  claim  by  backward  induction.  Consider  the  n  sets 
S,  =  V-{i}.  Then 

Lmst{S,)  +  ciKj)  >  LmsHV)  =  Lmst     V(e, j)  6  MST,  (IS) 

because  the  tree  created  by  adding  the  edge  {i,j)  to  the  MST  on  S,  is  a 
feasible  solution  to  the  instance  V.  We  apply  (18)  for  i  =  1  . . .  n  and  since 
it  holds  for  any  edge  in  the  MST,  we  choose  for  every  i  the  corresponding 
edge  (?',  j)  from  the  MST,  such  that  the  n  —  1  edges  {i,j)  are  distinct,  and 
one  edge  is  the  one  with  the  minimum  cost  among  all  edges  in  the  MST. 
Summing  over  all  i  we  get 


Yl^MST{Sr)  +  Y^c{iJ)  >  uLmst- 


i=\  «=i 

In  order  to  choose  n  —  1  edges  (f,  j)  to  be  distinct  and  the  one  remaining  the 
leeLst  in  cost  we  perform  the  following  algorithm: 

1.  Find  the  edge  e'  with  smallest  cost  c(e*). 

30 


2.  Until  the  node  set  is  non-empty, 

if  I  is  a  leaf  in  the  MST  then  let  (i,  j,)  be  the  unique  edge  in  MST. 
l({i,j,)  7^  e*  delete  I. 

3.  For  the  two  remaining  nodes  let  e"  be  their  corresponding  edge. 

Since  there  are  n  —  1  edges  {i,j)  that  are  distinct,  then 

"  1 

^c(z,  j)  =  La/st  +  c(e*)  <  (1  +  ——r)LMST' 

•=i  "      '■ 

As  a  result, 

"  1  n{n  —  2) 

I>n-i  =  y^^MsriS,)  >  (n  -  1 -)Lmst  = —Lmst- 

r— r  n  —  1  n  —  1 

1=1 

Consider  now  the  t  =  (l)  subsets  of  \'  of  cardinality  k,  Ai,  A-2,. . .,  Af  For 

all  A,  let  A,,j  =  A,  -  {j}.  Arguing  as  before 

Adding  with  respect  to  i  we  get 

E^mst(A.,)>^^^Z)..  (19) 

But, 

E^A^5T(A,.,)  =  (n-fc  +  l)Z),_i,  (20) 

since  in  the  summation  in  (20)  we  count  each  distinct  subset  of  the  [k-ij 
subsets  of  V  of  cardinality  k  -  1,  n  -  k  +  I  times.  Combining  (19),  (20)  we 
find 

[k  —  l)(n  —  K  +  1) 
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Applying  (21)  inductively  we  easily  obtciin  (17)  Then  from  (16),  (17)  we  find 


Therefore, 


^[Ea/st]  >  tp'i^-Pr-'^f^fjLMST. 


np  +  a-pT-l 

^[■^MSTl  d.  \ LmST- 

n  —  1 


Note  that  as  n  — ^  oc  the  bound  becomes 

It  is  not  clear  that  E[E\fST]  <  £'[i^Tp]-  In  fact,  we  give  an  example  where 
Ei^Msr]  >  E[Ltp]-  Let  G  =  {V,E)  be  a  complete  graph  A'„  with  c{i,j)  = 
c,  +  Cj,  ci  <  C2  <  . . .  <  c„.  Then,  the  MST  is  the  star  tree  rooted  at  node  1. 
As  will  be  shown  in  proposition  12  the  optimal  PMST  is  the  same  star  tree. 
As  a  result, 

E[Lt^  =  P(l  -  (1  -  p)"-^)[(n  -  l)ci  +  f:  c,]. 

In  this  example  we  will  be  able  to  find  a  closed  form  expression  for  E[Emst] 
by  exploiting  the  special  structure  of  the  cost  function.  If  the  ith.  node  is 
present  and  the  1st,...,  i  —  1th  nodes  are  not  present  then  the  optimal  tree  is 
the  star  tree  rooted  at  node  i.  From  this  observation  we  can  write  a  closed 
form  expression  for  E[Emst]  '• 

n-l 

E[^mst]  =  ^p{^  —  py~^E[LT,\node     i     is     present], 
1=1 

where  E[Lx^]  means  the  expected  length  in  the  PMST  sense  of  the  star  tree 

rooted  at  node  i  with  leaves  z  +  1, . . . ,  n.  Since  E[Lt,  \i  is  present]  =  pLr,  = 
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p[{n  —  1  —  i)c,  4-  IZfc=,+i  C/t],  after  some  algebraic  manipulations,  we  easily  find 

that 

El^xfsr]  =  pf2c,\pin  -  0(1  -  py-'  +  1  -  (1  -  p)'-']. 
t=i 

Choosing  Ci  =  I  we  find 

£[Ir,l  =  "'"^''-V  +  n-?  +  0((l-pr). 

E[^mst]  =  "'"^f  "%  +  0((1  -  p)"). 
Letting  np  =  c  and  n  — >  oo  we  see 

^[^Tp]  ^  (c/2  +  1  -  3/c)n,      £[EA/sr]  -^  cn/2. 

Thus  as  n  — >  oc  -Ei-Z^jp]  >  E[Emst]  for  c  >  3,  but  E[LTf,]  <  £[Ea/5x]  for 
c<  3. 


6      Some  Special  Cases 

In  this  section  we  exploit  some  of  the  combinatorial  properties,  which  were 
proved  in  section  5,  to  find  some  special  cases  in  which  we  can  solve  the 
PMST  problem  in  polynomial  time.  In  section  4  we  have  seen  that  the  more 
restricted  versions  of  the  PMST  problem  with  c(e)  =  1  in  a  non-complete 
graph  and  c(e)  G  {1,  M}  in  a  complete  graph  are  NP  —  complete. 

6.1      The  Role  of  the  Star  Tree 

The  first  natural  question  concerns  the  complexity  of  the  problem  when  we 
combine  the  above  restrictions,  i.e.  when  we  have  a  complete  graph  with  all 
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costs  c(e)  =  1.   We  prove  a  more  general  theorem,  which  includes  this  case 

and  chaxacterizes  the  optimal  solution. 

Theorem  11 

In  the  case  where  p,  =  p  for  all  t  €  V,  whenever  the  optimum  solution  of 

the  MST  problem  is  a  star  tree  T.,  then  T.  is  also  the  solution  to  the  PMST 

problem. 

Proof: 

For  all  trees  T 

E[LT]  =  Y,<^)^i\^<'\)^<f>i^)LT 

from  proposition  3.  But 

Since  T.  is  by  assumption  the  MST  Lj  >  Lj.  for  all  trees.  Combining  the 
above  inequalities 

E[Lt.]  <  E[Lt]. 

Therefore,  the  star  tree  T.  solves  the  PMST  problem.  • 

Theorem  11  characterizes  the  optimal  solution  whenever  the  MST  is  a 
star  tree  T..  But  are  there  interesting  examples  in  which  T,  is  the  MST? 
Proposition  12 

In  the  following  examples  the  MST  is  a  star  tree  T.  and  thus,  by  theorem 
11,  r.  is  the  PMST. 

1.  A  complete  graph,  with  c{i,j)  =  c,  +  Cj. 

2.  A  complete  graph,  with  c{i,j)  =  c.Cj, c,  >  0. 
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3.  A  complete  graph,  with  c{i,j)  =  c,  +  Cj  -|-  d,dj,w\t\i  ci  =  mine,  and 
di  =  min<fi. 

4.  A  complete  graph,  with  c{i,j)  —  min(ci,Cj). 

Proof: 

Consider  an  arbitrary  tree  T .  Without  loss  of  generality  we  assume  that 
Ci  =  min,  c,.  Since  T  is  connected  its  cost  is  Lt  =  c(2, 12)  +  -  •  •+c(n,  i^),  where 
at  least  one  ij  is  1.  Then  Lj  =  C2+c,j  +  . .  .-\-Cn+c,„  >  (n  — l)ci-|-C2  +  . .  .+c„  = 
Lt.,  i.e.  T,  is  the  MST,  with  T,  rooted  in  node  1. 

With  exactly  the  same  argument  we  can  prove  that  T,  is  the  MST  in  the 
other  ca^es.  • 

A  corollary  of  theorem  11  is  that  in  a  complete  graph  with  c(e)  =  1  the 
MST  is  a  star  tree  T.  and  thus  T.  is  also  the  PMST.  Hence,  in  this  case  the 
optimal  solution  can  be  found  in  0(n)  time. 

6.2      The  Case  p,  ^  pj 

We  have  shown  that  the  optimum  PMST,  in  the  case  p,  =  p,  is  a  star  tree 

T.,  whenever  T.  is  the  MST.  Does  this  result  continue  to  hold  even  in  the 

case  p,  7^  Pj?  The  following  theorem  answers  this  question. 

Theorem  13 

If  the  probability  of  presence  of  node  i  is  p,,  pi  =  minjp,  and  the  MST  is  a 

star  tree  T,  rooted  at  node  1,  then  T.  is  the  PMST. 

Proof: 
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From  the  closed  form  expression  (3)  for  ajiy  tree  T 

£:[lt]  =  i:<^){i- n(i-p.)}{i-  n  (i-p.)}. 

e€T  i€K^  xGY-Ke 

Let  Xi  =  \  —  Pi  and  without  loss  of  generality  we  assume  that  1  G  /^e-  But 

(i-n^.)(i-  n  x.)-(i-xi)(i-nx.)=(x,-  n  ^o(i-  n  ^.)>o, 

because  n.6A-<-{i}  ^i  <  1  and  Xi  >  i,  >  n.€V-A'«  a;,. 
As  a  result, 

E[Lt]  >  Ml  -f[{l  -  p.))Y:<e). 

i=2  eeT 

By  the  assumption  that  Lj.  <  i^r  we  find  that 

E[Lt]  >  E[Lt.]. 

The  star  tree  T.  is  the  PMST.  • 

As  a  corollary,  in  a  complete  graph  with  c(e)  =  1  the  PMST  is  the  star  tree 

rooted  at  node  /,  where  pi  =  min,  p,. 


6.3      Sensitivity  Analysis 

We  investigate  next  the  conditions  under  which  the  star  tree  T.,  which  was 
optimal  for  certain  special  cases,  remains  optimal  in  the  case  p,  =  p  when 
the  cost  function  is  arbitrary. 

We  define  a  node  in  a  tree  to  be  an  outer  node  if  the  degree  of  the  node 
is  one,  an  inner  node  if  the  degree  of  the  node  is  two  or  more.  If  we  erase  all 
outer  nodes  from  a  tree  of  n  nodes  then  the  remaining  graph  is  again  a  tree 
formed  by  inner  nodes.  This  tree  will  be  referred  to  as  the  inner  tree  Tj.  In 
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the  tree  Tj,  there  again  must  be  some  nodes  of  degree  one  and  these  nodes 

axe  called  extreme  inner  nodes. 

Theorem  14 

Let  a,b,c  be  the  costs  of  three  sides  of  any  triangle,  i.e.    any  set  of  three 

nodes,  in  the  n  —  node  network  (n  >  4),  with  a  <  b  <  c.   If  there  exists  a 

positive  t  with 

t  <  (1  -p)(i  _  (1  -p)L?J-^)Vp(L^J  -  i)(i  -  (1  -p)"-')  =  0(^), 

such  that 

a  +  tb>  c 

for  all  triangles  in  the  network,  then  there  exists  a  PMST  which  is  a  star 

tree. 

Proof: 

Note  that  the  smaller  the  value  of  /,  the  more  restrictive  the  inequality.   If 

t  =  Q,  then  it  restricts  all  sides  of  any  triangle  to  be  of  equal  length.  If  <  =  1 

it  reduces  to  the  regular  triangle  inequality.    Since  the  value  of  t  must  be 

less  than  1,  this  condition  is  stronger  than  the  triangle  inequality,  i.e.  more 

restrictive. 

It  is  sufficient  to  show  that  we  can  reduce  the  number  of  inner  nodes  in 
any  spanning  tree,  which  is  not  a  star  tree,  without  increasing  the  expected 
cost.  So,  let  T  be  any  spanning  tree  which  contains  at  least  two  inner  nodes. 
Let  A'',  be  an  extreme  inner  node  in  T  with  a  neighbor  Np  which  is  an  inner 
node.  Since  A^^  is  an  extreme  inner  node,  all  its  neighbors  except  Np  must 
be  of  degree  one  in  T.  Call  these  nodes  Ni,...  Nk-i-  This  is  shown  in  Figure 
7,  where  the  distance  between  Np  and  Ni  is  denoted  by  c,. 
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>  k-i  no< 


Figure  7:  The  tree  T 

Let  us  construct  a  new  spanning  tree  Ti  which  is  the  sajne  as  T  except 
that  the  nodes  Ni  {i  =  I , . . . ,  k  —  I)  are  connected  to  Np  directly.  In  the  new 
tree  T\,  Ng  is  no  longer  an  inner  node.  Thus  the  number  of  inner  nodes  has 
decreased  by  one.  We  will  show  that 

E[LtA  <  E[Lt]. 


If  we  apply  this  idea  recursively  to  the  trees  TijTj,. . .,  we  will  finally  get  a 
tree  T.  which  is  a  star  tree.  Since  the  part  of  the  tree  to  the  left  of  Np  is 
exactly  the  same  for  both  T  and  Ti,  we  need  only  calculate  the  expected  cost 
for  the  part  to  the  right  oi  Np.  If  a;  =  1  —  p,  then 

E[Lt]  -  E[Lt,]  =  Ml  -  a:')(l  -  i"-*)  +  X:  a.(l  -  ar)(l  -  x"-^- 


1=1 


fc-i 


6o(l  -  x)(l  -  x"-M  -  E  c.(l  -  x){l  -  x"-^). 


1=1 


The  net  decrease  of  expected  cost  in  changing  from  T  to  Ti  is 


^"^  xCl  -  x*-Mfl  -  1"-*=-^ 

E[ir]-^[lTj  =  (l-x)(l-x"-^)$:[a.-c  +  6or  '^ 

«=i 


(it-  1)(1  -x)(l  -x"-i)- 


3S 


The  decrease  will  be  positive  if  each  term  is  positive,  namely 

,   ,   x(l-x^-M(l-x"-^-^)  ^, 
""■  ^  ^(fc-l)(l-x)(l-x"-i)  -  '^■ 

For  a  fixed  n,  x(l  -  x''-''){\  -  x"-''-i)/(^  -  1)(1  -  x)(l  -  x""^)  is  smallest 
when  k  is  largest,  that  is  when  k  =  [n/2j.  Thus  it  is  sufficient  to  have 

^    (1-p)(1-(1-p)ItJ-1)^   ^_ 

"•^S(LtJ-i)(i-(i-p)'^-')-  •• 

Assuming  a,  <  60  <  c,  gives  the  strongest  inequality,  and  we  have  the  state- 
ment of  the  theorem.  • 


7      Some  Concluding  Remarks 

We  have  seen  that  a  natural  probabilistic  variation  of  a  classical  combinatorial 
problem  has  the  potential  to  model  various  practical  situations,  offers  an 
alternative  way  to  update  solutions  to  problem  instances  which  are  modified 
probabilistically  and  leads  to  very  different  properties  in  comparison  with 
its  deterministic  counterpart.  The  simplest  possible  version  of  the  PMST 
problem  was  proved  to  be  NP  —  complete^m  sharp  contrast  with  the  fact 
that  the  MST  problem  is  solved  by  a  greedy,  most  straightforward  algorithm. 
Surprisingly,  our  analysis  of  the  combinatorial  properties  of  the  problems 
established  some  interesting  connections  with  the  network  design  problem 
and  naturally  with  the  MST.  In  particular^  as  the  probability  of  presence  p 
tends  to  0,  the  PMST  approaches  the  solution  to  the  network  design  prob- 
lem. This  limiting  behavior  suggests  the  idea  of  solving  the  network  design 
problem  as  a  sequence  of  PMST  problems,  which  is  a  topic  of  future  research. 
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At  a  final  step,  we  examined  the  special  role  of  the  star  tree,  which  can  be 
the  solution  of  the  PMST  problem  under  some  conditions. 

As  a  general  conclusion,  probabilistic  variations  of  classical  combinato- 
rial optimization  problems  raise  interesting  and  entirely  new  questions  com- 
pared with  their  deterministic  counterparts  and  in  addition,  understand- 
ing of  the  properties  of  the  probabilistic  problem  can  add  insight  to  de- 
terministic problems,  BlS  it  was  the  case  with  the  network  design  problem. 
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